Targeting Specific Distributions of Trajectories in MDPs

نویسندگان

  • David L. Roberts
  • Mark J. Nelson
  • Charles Lee Isbell
  • Michael Mateas
  • Michael L. Littman
چکیده

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorial Idioms for Target Distributions in TTD-MDPs

In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a...

متن کامل

Hindsight Optimization for Hybrid State and Action MDPs

Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. ...

متن کامل

Another look at search-based drama management

A drama manager (DM) monitors an interactive experience, such as a computer game, and intervenes to shape the global experience so it satisfies the author’s expressive goals without decreasing a player’s interactive agency. In declarative optimization-based drama management (DODM), the author declaratively specifies desired properties of the experience; the DM optimizes its interventions to max...

متن کامل

Biomechanical Investigation of Empirical Optimal Trajectories Introduced for Snatch Weightlifting

The optimal barbell trajectory for snatch weightlifting has been achieved empirically by several researchers. They have studied the differences between the elite weightlifters’ movement patterns and suggested three optimal barbell trajectories (type A, B, and C). But they didn’t agree for introducing the best trajectory. One of the reasons is this idea that the selected criterion by researchers...

متن کامل

Identification of bovine, ovine and caprine pure and binary mixtures of raw and heat processed meats using species specific size markers targeting mitochondrial genome

A specific polymerase chain reaction (PCR) method was applied for identification of bovine (Bos taurus), ovine (Ovis aries) and caprine (Capra hircus) pure and binary mixtures of raw and heat-processed meats. These meats are used in food industry products and/or for direct consumption of consumers. The mitochondrial DNA was amplified as a template in a PCR reaction by use of specific primers re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006